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A COMPARISON  OF  METHODS  FOR  OPTIMUM  MATCHING  OF  THE 
MEAN  VALUES  OF  TWO  SAMPLES 

by 

Angela  Mack  and  Maurice  J.  Da’ntith 


ABSTRACT 


ito  samp  !e  rank:"..;  methods 
samples  taken  from  a stati 
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INTRODUCTION 


There  are  many  situations  in  which  the  predominant  need  is  to  match  two 
outputs  as  accurately  as  possible,  so  that  by  subtraction  a zero  output 
is  achieved  As  an  example,  consider  a split-beam  transducer,  in  which, 
when  the  two  halves  are  opposite  phase  an  important  feature  is  the  depth 
of  the  null  In  this  situation  a small  mismatch  produces  a dispropor- 
tionate effect  upon  the  null  If  we  refer  levels  to  that  produced  by 
adding  the  two  outputs  it  is  easily  shown  that  (as  compared  with  the 
theoretical  -*>  dB  for  exact  matching),  a 1 dB  mismatch  gives  a -26  dB 
null,  a 2 dB  mismatch  a -19  dB  null,  a 3 dB  mismatch  a -15  dB  null. 

Matching  is  thus  critical,  but  this  may  be  a time-consuming  and  expensive 
process.  This  paper  compares  various  methods  for  attaining  the  best 
matching  that  is,  at  the  same  time,  more  economical  in  time  and  cost 
than  high  quality  control  or  selection,  by  taking  advantage  of  the  fact 
that  we  are  usually  dealing  with  arrays  of  elements  numbering  from  the 
order  of  tens  to  (exceptionally)  hundreds 

The  aim,  therefore,  is  to  match  the  means  of  two  samples  taken  from  a 
statistical  population  or  in  other  words  'how  can  two  teams  A and  B be 
chosen  so  that  they  are  as  well  matched  as  possible?'  There  are  various 
methods  of  doing  this: 

1)  Random:  The  samples  are  picked  at  random  from  the  given  dis- 
tribution. 

2)  Selection;  One  item  is  taken  at  random  from  the  population. 
Items  are  then  successively  picked  until  one  is  found  that  has  the  same 
weight  as  the  first  Item  chosen  and  subsequently  the  first  item  is  put 
In  A and  the  second  in  B.  This  is  repeated  until  the  samples  are  of 
the  desired  size  This  method  gives  amples  with  the  same  total  weight. 


1 
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3)  Matching  Sums:  The  samples  A and  B are  picked  so  that  the  total 
weight  of  the  items  ’n  A and  items  in  B are  equal. 

4)  Best  Matching:  If  the  samples  A and  B are  to  be  of  N/2  items 
each,  a sample  of  N ’terns  is  chosen  at  random.  These  are  then  divided 
into  A and  B so  that  the  weights  (individual  or  total)  are  as  well 
matched  as  possible 

5)  Ranking:  A sample  of  N ’terns  is  taken  at  random.  These  are 
then  ordered  ana  the  samples  A and  B are  chosen  from  these  taking  into 
account  this  ordering 

6)  Linear  Programming:  The  problem  may  be  turned  into  a linear 
program  whereby  the  funct’on  to  minimize  is  the  difference  of  the  total 
weights  Z ( A , -B,)  and  the  constraints  will  depend  on  the  particular 
problem. 

Methods  2 and  3 obviously  give  the  best  match  but  are  wasteful,  since 
many  items  have  to  be  p’cked  and  then  discarded  before  completing  the 
samples.  They  also  require  many  measurements,  since  each  item  has  to  be 
given  a weight  and  then  these  weights  have  to  be  compared  Method  4 
does  not  give  such  good  results  as  2 and  3 but.  is  not  wasteful  and 
involves  less  measurements  Method  5 requires  less  measurements  than  2, 
3,  and  4 and  although  results  are  not  so  good  can  produce  a definite 
improvement  on  l In  certain  cases  onemay  not  be  able  to  assign  to 
each  item  a weight,  although  using  some  criterion,  they  can  be  ordered 
In  situations  such  as  these,  methods  2,  3,  4,  and  6 cannot  be  used, 
leaving  only  random  and  ranking  methods.  When  sufficient  data  is 
available  method  6 may  seem  a good  approach,  but  in  fact  this  may  not  be 
so  (See  Appendix  E ) 

This  study  compares  two  ranking  methods  with  the  random  method  for  two 
particular  parent  populations.  Rectangular  and  Gaussian.  The  two  ranking 
methods  considered  are: 

a Alternate  Ranking 

Having  taken  a sample  of  N items  from  the  parent  population  these  are 
ordered 

xi  * x2  * • * XN 

(N  is  even  and  indicates  the  total  number  of  items  chosen  such  that  each 
sample  A and  B contains  N/2  items  ) 

Starting  from  x;  , d’vide  into  the  two  samples  according  to  the  rule 
AB  A B A B 

If  A represents  the  difference  of  the  sum  of  the  weights  of  the  items 
in  A and  the  sum  of  the  weights  of  those  in  B then,  for  any  rule  based  on 

N 

Z Tr  x . where  y 3 t 1 

r = 1 

r odd  (sample  A) 
r even  (sampje  B) 


ranking, 


A 3 


Here 


V = 1 

Yr  - -1 
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b Bid1 ternate  Ranking 


Repeat  as  for  the  alternate  method  but  divide  into  the  two  samples  according 
to  the  rule 


ABBAABBA 


For 


N 

A = Z > 

1 


x 

r r 


if 

r 

0 or  1 

mod 

4 

if 

r 

2 or  3 

mod 

4 

where 

>r  = 1 

v = -1 

Method  b is  expected  to  give  better  results  than  a.  In  fact,  in  the 
alternate  method,  for  every  pair  AB  the  larger  one  goes  in  B and 
hence  the  sum  of  the  B's  is  always  greater  than  the  sum  of  the  A's.  To 
compensate  for  this,  in  every  other  pair  the  order  AB  is  inverted  to 
give  AB  BA  AB  BA  ...  With  this  method  the  pair  AB  puts  the  larger 
number  in  B while  the  pair  BA  put  the  larger  one  in  A , thus 
reducing  the  difference  between  the  sum  of  the  A's  and  the  sum  of  the 
B's. 


1 METHOD 


The  problem  was  approached  both  analytically  and  numerically.  Because 
the  analytic  method  proved  difficult  the  numerical  approach  was  used 
both  to  check  results  and  to  obtain  them  where  analytically  it  was 
impossible 


1 1 Analytic  Solution 

For  each  parent  population  considered,  the  requirement  is  to  estimate  the 
mean  value  A*  and  the  variance  about  the  origin  K2  where  A has 
already  been  defined  in  the  Introduction.  Considering  a generic  population 
of  probability  distribution  p(x)  and  denoting  the  cumulative  integral 

/ x*  p(x)  dx  by  P(xf)  and  Q.  = 1 - P.  = f"  p(x)  dx 
- a L t t *t 

the  following  genera1  expressions  are  obtained  (see  Appendices  A and  B) 

p'"*1  QN*r 

y Pfr-lV  !n-7T‘  A1  <El1  11 

= p(y)) 


N 

A = N!  Z j 

r=  1 


•fat 


(P  = P(y);  P 


3 


> f 


12  - 


N-l  N 
Nv2  + 2N ! Z Z 


r-1  N-s 


+CX 


P,  Q (P  - Pj 


s-r-1 


z v v z- 


Y„  Y«  / yp„  dy  / z p,  dz 

r=l  S=r+1  r y -<x  z (r-1)!  ( N - r ) (s-r-1)! 


(Eq.  2) 


(Py  = p(y)  ; p2  = p(z)) 

v2  = Variance  about  the  origin  of  the  parent  population. 


1.1.1  Rectangular  Distribution 


Consider  the  rectangular  distribution 


D( V ) = ( 1 xe£°>  U 

( 0 elsewhere 


The  population  has  a mean  of  a ^ and  the  variance  about  the  origin  is 
1 


v2  = / x2  p(x)  dx  = A- 


0 


Also 


P(x)  = I p(x)  dx  = x for  xe[0,  1] 


Q(x)  = 1 - x 


The  limits  are  replaced  by  1 and  0.  Equations  (1)  and  (2)  become 


N , (l-y)"-1' 

A = "!  r c /o  Tr  y xTTTyr  XF7? — d> 


T2  . N 


N-l  N 

z z 

r=l  S=r+l 


.r-1,,  ,.\N-S/..  _\S-r-l 


1 y 

z dz  • 

0 0 (r-1)!  (N-r) ! (s-r-1)! 


A2  = (y  + 2N!  Z Z yr  ys  / y dy  / z dz  CL-yi.1 


a . A1 ternate  choice 


Yr  - (-D 


r-1 


Aa  = N!  Z f y 


N 1 (-y^  ( 1 -y ) N-r 


r=l  0 (r-1)!  (N-r) 


dy 


4 


The  summation  is  now  a binomial,  so 


A.  3 N / y(l-2y)N'1  dy  . 

A 0 

Integrating  by  parts  and  taking  account  of  the  fact  that  N is  even, 

A * -N-  ■ 

A 2(N+1 ) 


For  the  variance 


^ -1  * 2N'  4,  s;r„  'ey*’  ;o2d2  2 


N-l  N 


r+s 


_r-i  (l-y)N'S(y-z)S~r~1 

( r-1 ) ! (N-r) I (s-r-1 ) 


but 


(-l)r+s  = (.i)r-s(.i)2s  = (-l)r's  = -(-l)r's'1 


N-s  s-r-1 


N_1  N r-i  (l-y)  (z-y) 


A?  = - - 2N 1 / y dy  fy  z dz  I I zr" 

A 3 0 0 r=i  S=r+1  (r-1 ) I (N-r) 1 (s-r-1 ) ! 


The  double  summation  is  a multinomial  equal  to 
— (z  + (l-y)  + (z-y) )N’2 

(n-2) : 

=>A2  = - - 2N(N-1)  / ydy  / y z dz  (2z-2y+l)N*2 

A 3 0 0 

Integrating  by  parts  for  both  integrals, 

a2  .-JL_  . 

A 4(N+1) 

Using  these  expressions  for  A and  A2  the  variance  about  the  mean  A is 

p = p _ p = N Nf = N 

"A  ‘A  A 4(n+1)  4(N+1 )2  4(N+1 )2 


Flgures  3 to  8 give  an  idea  of  the  frequency  distributions  of  A/  N/2 
i.e  the  frequency  distribution  of  the  difference  of  the  sample  means 
fnr  uarinu*  values  of  N (A  in  the  alternate  ranking  method  is  taken 


b B'a1  te-*nate  Choice 


The  fol’owmg  resu'ts  were  obtained  (Append’x  C) 


N/2 

even  => 

' ab 

- 0 

( N*  1 ) 

N/2 

odd 

^B 

-1 

N*  1 

’8 

- ( N*  1 ) 
2 

■ance 

about 

the 

means 

N/2 

even 

°B 

--  (N* 

l) 

N/2 

odd 

0 *■ 

B 

= ( 3N+1 

)/2(N+ 

iv 

Random  Cho’ce 

Two  samp^s  of  N/2  ’terns  are  se’ected  at  random  since  the  va!ue  of  o2  fo' 
single  samples  ’$  V1?  and  the  variance  of  the  samp'e  mean  is  £_ 

N/2  ’ 6N 


Then  the  va^’ance  of  the  d’ffe'en'p  of  the  mean  ’s 


? ' 6N  3N 


8ut  valance  of  difference  of  means  - Ap  (N2)2 


=4 


1 N 


R 3N  4 12 

The  resuHs  are  summarized  in  Tab1e  1 and  shown  gnaphica’ly  In  Figs  1 & 2 


T ABLE  t 

RECTANGUIAR  DISTRIBUTION,  MEAN  VARIANCE  ABOUT  MEAN  1/12 


Method 

A 

CA 

Random 

0 

N/ 1 2 

SI 

N even 

Ai tennate 

N 

2Tn+tt 

N 

47n+tt 

i__  A 

Nv. 

N even 

Bial tennate 

0 

1 

FTn71! 

/ — - — * 

*2(N*1 ) 

N 

N even,  - even 

B i a 1 ternate 

1 

N+1 

3 

2(N+1 ) 

i y5rf*T 

N + l 2 

N even,  - odd 

2 

r 


1 1 2 Gaussian  Distribution 


The  distribution  is 


p(x)  = e 


x ' 
2 


(mean  0,  variance  1) 


No  results  were  obtained  for  A*  and  7r 

A B 


a . A1  ternate  Choice 


V = (-D 


r-1 


so  Eq.  1 becomes 


j. . n ? N‘’  ”r-' 

A “*  r=l 


r-1 


y p(y)  dy 


N-l 


N / , y p(y)  (P-Q)  dy 


Now  x is  an  odd  function;  the  distribution,  being  normal  is  an  even 
function;  P(-x)  = Q(x)  so  P-Q  is  odd  Hence  for  N even,  the 
integral  is  an  even  function  and  so 


>N-1 


= 2N  fo  y p(y ) (P-Q)  dy 


and  this  is  unequal  to  zero 

(For  an  approximation  of  th’s  integral  see  Appendix  D ) 


b Bialternate  Choice 


ab  n 


_iNdl 


r (r-1)'  (N-r) 


y p(y)  p' ’ 1 QN_'  dy 


Consider 


,0 


J-<*  > r 


-■  1^1  )V_  - 

( r- 1 ) ! (N-r) 


y p(y)  p'  1 QN  ' dy 


and  write  -y  for  y 


, 0 


'♦>«  1 r 


— y p(y ) Q(  '1  PN"’  dy 


(r-1 ) ! (N-r) . 
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put  N-r 


r 1 - 1 


,.0  

+°  N'r’+1  ( r • -1 ) : (N-r 1 ) 


(N-l 1 ! , v r'-l  nN-r ' . 

l ^ y p(y)  p Q dy 


If  NO  mod  4 •->  >N_r,  + 1 = >r 


Hence  the  two  integrals  are  of  eyual  magnitude  but  opposite  sign  and 


so  A 


B 


0. 


c.  Random  Choice 

The  expected  value  of  A will  obviously  be  0.  For  a normal  distribution 
and  a variance  of  a2  the  standard  deviation  of  the  averages  of  samples 
of  n items  is 

o 


so 


0 ► - 


standard  deviation  of  the  difference  of  the  sample 


1.2  Numerical  Solution 

The  solution  was  based  on  a Monte  Carlo  method  For  the  rectangular 
distribution  a random-number  generator  was  used  to  generate  uniformly- 
distributed  random  numbers  uj  in  the  range  ui  e[0,  1]  using  the 
formula 


N 2 


Here 

means 


v.V 

N 


Uj  - fractional  part  of  [(n  + u^  ^ ) s 

For  the  gausslan  distribution  this  was  modified  to  generate  pairs  of 
normal  random  deviates  with  mean  0 and  variance  1 using  method  3 on 

p.  953  of  Ref.  1 In  each  case,  between  20  and  30  groups  of  N random 

numbers  were  generated.  For  the  random  selection  each  group  was  split 

Into  two,  giving  the  two  samples  A and  B with  N/2  numbers  In  each.  The 

sample  means  of  A and  B were  calculated  and  then  the  difference  of 
these  means  Finally  the  average  and  standard  deviation  of  these 


11 


differences  was  found.  For  the  ranking  methods  the  groups  of  numbers  were 
ordered  and  then  divided  Into  the  samples  A and  B,  first  using  the  alternate 
ranking  method  and  secondly  using  the  bialternate  ranking  method.  As  for 
the  random  selection,  the  differences  of  the  sample  means  were  calculated 
and  then  the  average  and  standard  deviations  of  these.  Tables  2 & 3 Indi- 
cate the  results  obtained. 


TABLE  2 


RESULTS  OBTAINED  FOR  A RECTANGULAR  DISTRIBUTION 
MEAN  VARIANCE  1/12 


METHOD 

N 

MEAN 

STANDARD  DEVIATION 

Random 

4 

-0.02 

0.25 

A1  terriate 

-0.22 

0.10 

Bial ternate 

0.01 

0.14 

Random 

6 

-0.01 

0.29 

A1 ternate 

-0.16 

0.06 

Bial ternate 

-0.04 

0.07 

Random 

20 

-2.25  x 10'3 

0.10 

A1 ternate 

-0.04 

0.02 

Bial ternate 

2.45  x 10" 3 

0.02 

TABLE  3 


RESULTS  OBTAINED  FOR  A GAUSSIAN  DISTRIBUTION 


MEAN  0,  VARIANCE  1 


METHOD 

N 

MEAN 

STANDARD  DEVIATION 

Random 

4 

0.06 

0.61 

A1 ternate 

-0.50 

0.35 

Bial ternate 

0.01 

0.33 

Random 

8 

-0.11 

0.49 

A1 ternate 

-0.32 

0.12 

Bial ternate 

0.01 

0.14 

Random 

20 

0.03 

0.33 

A1 ternate 

-0.14 

0.05 

Bial ternate 

-0.01 

0.05 

Figures  3 to  8 were  drawn  from  the  numerical  data  obtained.  For  each  one 
a frequency  table  of  the  difference  of  the  sample  means  for  a given  N was 
constructed  and  this  was  then  used  to  construct  the  histogram. 


DISCUSSION 


2.1  Comparison  of  Analytical  and  Numerical  Results 


Comparing  the  numerical  and  analytical  data  it  must  be  remembered  what  has 
been  calculated  in  each  case.  Analytically,  calling  A the  difference  of 
the  sum  of  the  weights  of  items  in  A and  weights  of  items  in  B = Xi  - X2  , 
expressions  for 


A = / 


+ <X 


A p( A)  dA 


(mean) 


and 


A2  = A2  p ( A ) dA 


,+« 


(variance  about  origin) 


have  been  found. 


Numerically,  taking  the  two  samples  A and  B,  (Xj  - X2 ) / N/2  was  calculated 
and  this  was  repeated  for  the  20  to  30  sets  of  samples.  Finally  the  mean 
and  standard  deviation  of  these  was  calculated: 


mean 


7 / N 
— A / ^ 


standard 

deviation 


/ITiJ 


and  these  are  the  values  that  we  are  investigating  i.e.  the  parameters  of 
the  distribution  of  the  difference  of  the  sample  means. 


I 


Consider  the  rectangular  distribution,  mean  variance  about  the  mean  1/12. 
From  Table  1 we  should  expect  to  obtain  the  results  given  in  Table  4 and 
this  is  consistent  with  the  numerical  results  obtained  (Table  2).  Numerical 
results  obtained  for  the  means  relative  to  the  gaussian  distribution  also 
compare  favourably  with  analytical  results.  In  fact,  from  Table  3,  the 
means  for  the  random  and  bl alternate  methods  are  near  zero,  as  they  should  be. 
From  Appendix  D,  an  approximation  for  |S|  in  the  alternate  method  is 


a 


N / - x 0.928  x 

IT 


(H) 


-tt/4 


Evaluating  for  N = 4,  N = 8 and  N = 20  we  obtain  1.49,  1.84,  2.34  respecti- 
vely, which  give  means  of  0.75,  0.46  and  0.23. 


L ’ 


I t't  ..H  I ‘.t  ' 


-O.b  0 I'.b 

b»l  ANS 


a ) N = 4 


Ml  ANS 


b)  N = 6 


flu.  3 UNIFORM  DISTRIBUTION 
RANDOM  MFTHOD 


14 


FREQUENCY 


10 

b 
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2 


0 0.1  0.2  0.3  0.4  0.5 

MEANS 

a)  N = 4 


FREQUENCY 


MEANS 

b)  N = 6 


FIG.  4 UNIFORM  DISTRIBUTION 
ALTERNATE  METHOD 
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FRl QUINCY 


fig.  6 

GAUSSIAN  DISTRIBUTION 
RANDOM  METHOD 

N = 4 


-0.1- 


0.6 


FRU'.IN 


FIG.  7 

GAUSSIAN  DISTRIBUTION 
ALTERNATE  METHOD 


N = 4 


FIG.  8 

GAUSSIAN  DISTRIBUTION 
BIALTERNATE  METHOD 


FREQtlNCY 

1, 


n 


h n 


ABLE  4 


~ N STANDARD  DEVIATION  - 


>•  P'st*"  but  ’on 

»-ent  distribution  with  a mean  of  \ and 
ese  can  be  generalized  to  a standard 
le  5 


TABLE  S 


-N  . 3 o'  3N  o - 


/(N+1 


o . N 


v 3N  o / 

/ N»1 


0 6oMn*i) 


6 

''  * N^T 


N 

- even 


’8  o\ 


(N*l) 


o n odd 

N + 1 


Figures  3 to  8 give  an  idea  of  the  frequency  distributions  of  A/  N/2 
i.e  the  frequency  distribution  of  the  difference  of  the  sample  means 
for  various  values  of  N (A  in  the  alternate  ranking  method  is  taken 
to  be  positive)  and  they  will  be  of  the  types  shown  in  Fig  9 


a)  Random  b)  Alternate  c)  Bialternate  d)  Bialternate 


FIG.  9 PARENT  POPULATION  RECTANGULAR 


Alternate  b)  and  Bialternate  d)  show  a displaced  mean.  From  Mg  1 
both  these  tend  to  0 for  N -*■  °°  , in  b)  as  1/N  and  in  d)  as  1/N2 


a)  Random  b)  Alternate  c)  Bialternate 


FIG.  10  PARENT  POPULATION  GAUSSIAN 

These  shows  the  advantages  of  the  bialternate  ranking  method  It  gives 
a distribution  closely  centred  about  a mean  that  is  either  0 or  almost 
zero.  Taking  into  consideration  the  numerical  results  for  N = 20,  if 
the  parent  population  is  gaussian  the  bialternate  ranking  method  gives 
a standard  deviation  of  0.05,  Samples  of  1600  items  would  have  to  be 
taken  in  order  to  have  such  a result  using  the  random  method 

Figure  2a  shows  that  when  the  parent  population  is  rectangular  a2 
increases  linearly.  This  means  a standard  deviation  that  decreases  as 
1/  /N.  Figure  2b  shows  that  in  the  alternate  ranking  process  the 
standard  deviation  -*■  0 as  N -*•  <»  as  1/N  while  Figs  2c  & d show 
that  in  the  bialternate  method  the  standard  deviation  -*-0  as  1/N /IT, 
which  are  much  more  desirable  results. 

Not  having  theoretical  results  for  the  standard  deviation  in  the  case  of 
a Gaussian  parent  population,  it  is  difficult  to  say  what  will  happen 
when  N is  large. 
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An  indication  of  the  ’'eaT  va’ue  of  the  standard  deviation  may  be  had  by 
ca1cu1at’ng  conf’dence  interval  Th's  can  be  done  using  x'  fr°r 
confidence  interval  w’th  a ’eve1  of  confidence  of  95t 


S . n S . n 

x 0 975  x0  025 

where 

s : standard  deviation  of  a sample  taken  from  the  distribution 

n * no  ’n  sample  (this  is  not  N,  but  the  total  number  of  runs 
done  for  each  N i e 20  to  30) 


tab’e  6 ’$  obtained 


T ABLE  6 


N 

RANDOM 

ALTERNATE 

BIALTERNATE 

4 

(0 

49. 

0 84) 

(0  28, 

0 481 

(0  27, 

0 46) 

8 

(0 

38, 

0 74) 

(0  09, 

0 18) 

ion. 

0 21 1 

20 

(0 

26, 

0 501 

J 

(0  04, 

0 08) 

(0  04, 

0 08) 

Th • s table  shows  that,  fo*-  earh  method,  the  standard  deviation  decreases 
as  N increases  and  tor  the  ranking  methods  't  is  much  smaller  than  for 
the  i-andom  method  It  would  appear  also,  + hat  thei-e  is  little  difference 
between  the  a’te-nate  and  b’ai  ternate  methods  as  regards  the  standard 
deviation 
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APPENDICES 


APPENDIX  A 


CALCULATION  OF  A FOR  A POPULATION  OF  PROBABILITY  DISTRIBUTION  p(x) 


Con  ider  a population  of  probability  distribution  p(x)  Take  N samples 
of  values  X)  x,  ...  The  probability  that  one  of  these  lies  between 

xr  and  xr  + dxr  is  p(xr)  dxr  and  therefore  the  chance  that  the  first 
item  is  at  x,  , the  second  at  x2  etc  is  p(x,)  p(x2)  p ( x^ ) dx , dx^ 

But  there  are  N'  permutations  of  the  Nx's  so  that  the  probability  of  a 
random  sample  of  N having  these  values  is 


N’  p(x,)  p(x2)  ■ p(xN)  dx,  dxN 


Ordering  the  random  sample  x,  ^ x2  ^ <,  xw  1 To  calculate  A , 

N " 

£ x > must  be  integrated  over  all  va’ues  of  x subject  to  condition  l 

r=l  r r 


A = N 'If  f p(x,)  p(xN)  E xr  vr  dxj  dxN 

Choose  the  order  of  integration  for  the  term  >r  x as  (dx,  dx: 
dxr.l^  (dxr+i  dxr+2  dxN)  dxr  and  so 


A 


N 

N.'  E / I J \ x p(x  ) dx 
r r r r r r 
r=l 


where 


lr  = 


Ll  P(xr-15  dxr-l 


V-l 


p(xr.2)  dxr_2 


P ( x i ) dx, 


and 


J 


r 


Ar  e<Vl>  dV*l  Ar4l  P'V.p  d,r*Z 


An.,  !><*«> 
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Lemma  1 


r(«  r' 
_ 

(r-1  V 


Proof 


I .»  / p(xt)  dx,  P(x2 


result  true  for  r - 2 


generally  assume 


P(x$) 


s - 1 


(s-1) 


T hen 


x . P(x  ) P(x  ) 
s ♦ 1 s s 


s- 


1 s>  1 


dx 


S+  1 


(S-1) 


(s-1)' 

s 

Ps-'(x  ) 1 
s 

Vi 

Ls-1)_ 

(s-1)' 

-vV 

(s  1)' 

)S  (ill)  rxs+l 

P(xs 

' (s-1) 

-iV 

)S 

— - (S-1) 

1 t 

1 s * 1 

(x  ) dx 
s s 


s 1 


sH 


(s-i > : 


=>  i 


P(  X 


s * 1 


sM 


dx 


so  by  induction  true  for  all  s 


Lemma  2 


Qt|-r(»r) 

(N-r): 


Proof 


IV 

JN-1  = ;xN.i  P(Xn)  dXN  = Q(XN-1) 


lemma  true  for  r 


Assume 

J 


Js-1 


QN'S(xs) 
(N-s) ! 


/x  p(*s) 

xs-l 


QN'S(xs) 

(n-s): 


dx. 


Q(xs_,  ) 


N-s+1 


(N-s)! 

N-s  + 1 


( N-s ) QN~S'!(xc)  P(xc)  dx( 

vi  (n-s): 


Q(x$_i ) 


(n-s): 
>N-s  + 1 


QN'S(xs) 

- / (N-s)  P(x  ) dx 

xs-l  S (N-s):  5 


Q(x 


s - 1 


(n-s): 


- (N-s)  J 


^xs.i} 


s-1 
N-s+1 


(N-s+1 ) J = 

s (N-s): 


-=>  J, 


Q(xs_i) 


N-s+1 


(N-s  + 1 ) 


Applying  the  lemmas 
N +» 

a = n:  i ; >r x P(x  ) 
r=  1 * ‘v 


so 


N +«<  p 

A - N 1 E / > r y P — 

r=l  -»  (r 


r-1 


QN‘r(xr) 

(r-1)1 

(N-r)’ 

. P r 

p(y) 

QN'r 

dy 

(N-r): 

dx. 
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APPENDIX  B 


CALCULATION  OF  A2  FOR  A POPULATION  OF  PROBABILITY  DISTRIBUTION  p(x) 


N 

A2  = N'  E Yr  xr]2  p ( x i ) p(xN)  dX!  ...  dxN 

r=l 


since  y2  = 1 , expanding  the  squared  term 

N N-l  N 

A2  = N'  J7  . . f [ E x2  + 2 E E YrYs  x x lp(xi)  ...p(xN)  dx,...dx 
r=l  r=l  s>r 


Consider  these  terms  separately  For  the  integral  of  Ex2  , the  same 
technique  as  for  A may  be  used  leading  to 


+«>  N pr-l  „N-r 

N!  / E py2  y dy 

— r=l  (r-1)!  (N-r)i 


But  since  the  yr  terms  do  not  appear,  the  summation  becomes  a single 
binomial 


(P  + Q) 


N-l 


N (N-l)l 
r=1  (r-l)I  (N-r)! 


pr-l  gN-r 


whence  this  term  becomes 
+00 

N / py2  dy  = N v2  , 

-00 

where  v2  is  the  variance  about  the  origin  of  the  original  population. 
For  the  second  integral,  integrate  in  the  order 

(dxi  dx2  . . . dxM ) (dxr+1  dxr+2  . . dxs-1 ) (dxg+]  dx$+2  . . dxN)  dxr  dx$ 

So  the  integral  becomes 

N-l  N ^ x$ 

2 E I N!  / x p(x  )dx  / x p(x  )dx  I J K , 
r=1  s=r+-|  -oo  s s s -oo  r r r r s rs 


PRECEDING  PAGE  bLANK 
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whe^e  Ir,  J$  have  the  meanings  previously  used  while 


r '*  c - I ^ r*f  2 

Krs  * V P(Vl)dxs-l  t P<\-2)dxs-2  L P(xr«l>  dxr.l 


x-,-l 


Now 


r,r+2 


r+? 

; P<>W  dxr+l  = P(xr+2}  ' P(xr} 
xr 


r ,r+3 


*r+3 

f fP(xr+2^  * P^xr^  p(xr+2^  dxr-f2 

xr 


= [P(xr+?)  - P(xr)]‘ 
2 


(r+3 


[P(xr+3)  - P(xr)]: 


and  generally 


r s 


[P(*s)  - P(xr)] 
(s-r-1 ) ' 


s-r-1 


Hence  the  second  integral  becomes 


N - 1 N +®° 


pr"'<xJ  QN_S(*s)  [P(xs)-P(xr)l 


2N'  Z E / x p(x  )dx  / x d(x  ) dx. 


s-r-1 


r=l  s=r+l  -»  s s s -~  r r r (r-1 ) ! (N-s)!  (s-r-1) 


replace  x$  with  y and 
N-l  N 

A2  3 Nr2  + 2N 1 E E 

r=l  s=r+l 


xr  with  z so 

-foo  v 

y Py  dy  r z Pz 


d;Pr'’u)  qN~s(y)  [P(y)-PU)]s'r~1 
(r-1):  (N-s):  (s-r-1)’ 
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APPENDIX  C 


CALCULATION  OF  A AND  A‘  WHERE  PARENT  POPULATION  IS  RECTANGULAR  AND 
THE  METHOD  USED  IS  THE  BIALTERNATE  RANKING  METHOD 


An  expression  for  >r  is  given  by 


yr  = v?  cos  [tt/4  + (r-1)  n/2] 


,7  Re  e1"'4  et,/2lr-') 


where  Re  = real  part  of  the  convex  expression 


_ N ^ 

N ' Re  /T  l f p(y)  ye 
r= 1 -°° 


r- 1 


i tt/4  ( P ( y ) e " 7 j Q(y) 


N-r 


(r-1)’  (N-r) 


dy 


but 


p(y)  = 


l y e [0,  1 ] 


0 y*[0,i] 

P(y)  = y ; Q(y)  = 1 - y 

, /0  r-1  N-r 
N 1 in/4  (y  e n/2)  (1-y) 

i = N'  Re.7  I / ye  /4  


r=1  0 


•-1)  (N-r) 


dy 


= N Re  ► 2 e 4 y ( 1 -y  + y e '^)  dy 
e’n/4  = 1/  r7(l  + 0 


A N Re ( 1 i ) / y(1  + aiy)^  1 dy 
o 


where 

a - e’n/?-1  - i-l 
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Integrating  by  parts 


A - N Re( 1 + i ) 


xN  v n + i 

. N Re(  1+i ) a)  . Hl2*i 

aN  a2  N(N+1 ) 


S^nce  N is  even 


_ / i \N/2  ./  -i\N/2  , 

A = N Refl  +1 ) !— i 11— I- — + ] 

aN  a2N(Nd)  a2  N(N+1  ) 


1 1 .1+1 
a ’ 1-1  -2 


1_  _ 2i_  _ i 
a2  ’ 4 " 2 


'/  i \N/2+l  ....  , . > N/2 

A = N Re(Hl)  LlD Lilli  + LJJ — + 1 

2N  2N(N+1 ) 2N(N+l) 


N Re  ♦ 1 •l)N,?l'»'V.  (.-1) 

L N 2N(N+1)  2N(N+1) 


(-1 )"'  *•  1 (.1  )N/2 
2(N+1)  2(N+1)  2(N-1) 


If  N/2  ’S  even  A = 0 

If  N/2  is  odd  A = - 


To  find  A2  an  expression  for  >r  y$  must  be  found 

Yr  Ys  = 2 COsfir/4  +(r-1)  n/2)  COs(tt/4  + (s-1  )n/2) 

- COS ( -rr/2  ♦ (r+s-2)  tt/2)  + cos(s-r)  tt/2 
* COS  ( S + r- 1 ) n/2  * cos(s-r)  tt/2 

- Re[e,1T/2{s+r-1)  ♦ e1lT/2(s‘r)] 
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N -1  N i 
A2  = N/3  + N;  Re  E I / 
r=l  s=r+l  0 


yd y //Id2(e1"/2(s+r-1)+e,,'/2(s-r)) 


2r-1n,y)"-s|y-z)5-'-1 

(r-1 ) 1 (N-s) ' (s-r-1 ) ! 


ei  rr/2(  s+i — 1 ) _ ei  tt  ( r- 1 ) ei7T/2(  s-r-1 ) giiT 


eiir/2(s-r)  = eiTi/2(  s-r-1 ) ei-n/2 

N-l  N i y 

A2  = N/3  + N!  Re  I E f.0  y dy  /0  z dz 


r=l  s=r+l 


e”I(zef")r-'  <l-y)N-s((y-z)  ell22 


s-r-1 


(r-1)!  (N-s)!  (s-r-1) 


e»/2  zr-'(l-y)N-s 


i tt/ 2 ^ s-r-1 


(r-1)!  (N-s)'-  (s-r-1)' 


which  using  the  multinominai  theorem  becomes 


N 2 

A2  = N/3  + 2N(  N-l ) Re  !y  y dy  jjei7T(l+y(eilT/,2-l  )+z(eilT  - eilT//2))  + 

e'"/2(uy(e”'22-l)  * z(.V"/2>)  N_2]zdz 


lit  , 

e = -1 


el7T/2  = i 
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SJ  1 N/3  + 2N(N-i)  Re  /’  y dy  /y  z dz  (-( 1 + y ( 1 -1 ) -z(  1 + 1 )N'2  + 

o o L 

i[Hy(i-l)+z(l-1)]N‘2  J 


The  first  term  's 


-2NIN-1 ) Re  f]  y dy  /*  z dz [ 1 +y ( 1 -1 ) -z( i +1 )] 


N-2 


Integrating  over  z by  parts  gives 


-2N(  N 


\ y ~21N~-1)  fy  fl  +y ( i -1  )-z( i + l N" 1 dz 

! o ( 1 + i ) 0 ' 


- ( i + n n 


- (N-1)(1-1)(1-2>)N"1  - i(N-n  [d-2y)N  - (1+y(i-l))N] 


The  integration  over  y for  the  first  term  gives 

(H-IN1-0  / yP-?,)N-’  <J,  ■ iN-^l  - )»*’.,] 

0 2N  4N(N+1 ) 


N 's  even  and  only  the  real  part  is  required  This  is 

The  second  term  is 

2iN(N-M  f]  y dy  /y  z dz  [1  +y(  1- 1 ) + z(  1 -i  )]^  2 


Integration  over  z g’ves 
Ny ( i - 1 )♦!  - [1+y(i-!)]N 

■Integ-atmg  this  over  y and  taking  the  real  part  gives 

iN/2+1 


J*  ♦ 1 ♦ ilL 

3 2 2(N+1 ) 

(N-n  N t l + (~1)N/241  a 2*(-1)W/2*] 


-(N-U 

2(N+1 ) 


y - N/3 


2(N*1 ) 3 2 2(N+l ) 


2(N+i ) 


If  N/2  is  even  AJ  = - (N+l) 

If  N/2  IS  odd  A J = 2 (N+l ) 

2 
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APPENDIX  D 


AN  APPROXIMATION  OF  A WHERE  THE  ALTERNATE  RANKING  METHOD 
IS  USED  AND  THE  PARENT  POPULATION  IS  GAUSSIAN 


The  integral  to  approximate  is 

5 - 2N  / * y (p(y))  (P-Q)N_1  dy 
- 

* 2N  /o  y -A  e 2 (2P -1 ) N_1  dy 

V tTT 


This  was  done  both  numerically  and  theoretically.  The  numerical  approxi- 
mation was  obtained  using  the  generalized  Simpson  Rule  and  results  were 

N = 2 A = 1.13 

N = 4 A = 1.47 


For  the  theoretical  approximation  use  p 933  of  Ref.  1 


P(x) 


k + h ( 1 - e 


So 


' 9 +oo 

7 ;o  y 


<2!  2 


(1 


- N-l/2 

-e  71  ) dy 


change  the  variable 


- 1 ± 
e it 


t 


so 


dt 

t 


Hence 


■ i/Ti 

TT  4 


dt 


and  this  integral  is  equal  to  the  beta  function 


B(J  , - r(J)  r(^y-)  (p.  258  of  Ref.  1). 
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At  this  po3nt  the  tabulated  r values  can  be  used  or  else  a further  approxi- 
mation using  Stealing's  formule  (p  257  of  Ref.  1) 


’«  <r,!T»  - N'2  ’0  (T->  •Sr  4 19  2’  * mW)  1 0 (STTT 

I C o 


(r(Mr*f))  - <*♦?)  '9  -N-f  - 


2 2 4"  '2  4 


1 


1 


12  (N+Utt/2)  0(N3) 

2 


Hence 

ig 


N + 1 IT 

~T  + 4 


N lg  (Nil)  . (It +2)  1gfNtHl^) 

2 9 2 2 4 2 


tt/2 


,1 


+ — + + 0(rrr) 

4 6(N+1 ) (N+I+tt/4  n" 


Now  write  N = u + y where  y is  a constant  to  be  determined  The  reason 

for  this  is  that,  as  wi1!  be  seen,  by  suitably  choosing  y , the  logarithms  y 

may  be  expanded  to  give  the  best  availab^  accuracy  Retaining  only  terms 

of  the  order  of  1/N 


N+i 

T 


) 


r 


N + 1 + TT_ 

2 4 J 


!±tl  ig  Mllll 
2 2 


u+t+it/2  1 u+y+l  i-tt/2 
2 9 2 


u+> 

1 g — + 1 g (1+  *±1) 

u+>  +tt/2 

fig  “ ♦ >9  (lOiitl/l)] 

2 

2 u 

2 

2 4 
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So  the  term  in  1/u  is  identically  zero  if  y = - tt/4  so  that  N 
and  the  lg  term  becomes  -tt/4  lg  (N/2  + tt/8)  i.e. 


SO 

A - 

. N/2  /tt72 

r ( tt/ 4 ) (N/2  + tt/8) 

also 

r(l+TT/4) 

= tt/4  r (tt/4 ) 

and 

r(  I+tt/4) 

= 0.928 

=>  A = N/2  /7/2  x 4/tt  x 0.928  x (n/2  + tt/8) 
= N flh  x 0.928  x (n/2  + tt/8 ) -tt/ 4 


tt/4 


APPENDIX  E 


A COMMENT  ON  LINEAR  PROGRAMMING 


It  has  already  been  said  that  linear  programming  cannot  be  used  when  only 
ranking  data  are  available;  but  even  if  there  are  sufficient  data  the 
technique  may  not  produce  satisfactory  results.  First  of  all,  the  solution 
may  not  be  unique  and  so  a criterion  has  to  be  found  for  selecting  the  best 
of  the  optimal  solutions  Secondly,  the  linear  program  minimizes  the 
difference  of  the  means  of  the  two  samples,  but  this  may  be  at  the  cost  of 
an  even  spread  of  the  items  in  the  two  samples  i e. 

With  a linear  program  the  following  optimal  solution  is  found 


A B 


0 01 

0 30 

0 01 

0.37 

0.02 

0 40 

0.04 

0.44 

0 84 

0 51 

0.89 

0 56 

0.93 

0 57 

0.99 

0.58 

(samples  taken  from  a rectangular  distribution 
mean  \ , variance  about  mean  1/12) 


The  difference  of  the  means  is  0 but  sample  A contains  small  values 
and  large  ones  while  B contains  all  intermediate  values  The  ranking 
methods  assured  a more  even  spread  of  values  in  the  samples  although  the 
mean  may  not  be  0 , but  generally  speaking  this  situation  is  more 
desirable 
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